Protein Name Tagging for Biomedical Annotation in Text

نویسندگان

  • Kaoru Yamamoto
  • Taku Kudo
  • Akihiko Konagaya
  • Yuji Matsumoto
چکیده

We explore the use of morphological analysis as preprocessing for protein name tagging. Our method finds protein names by chunking based on a morpheme, the smallest unit determined by the morphological analysis. This helps to recognize the exact boundaries of protein names. Moreover, our morphological analyzer can deal with compounds. This offers a simple way to adapt name descriptions from biomedical resources for language processing. Using GENIA corpus 3.01, our method attains f-score of 70 points for protein molecule names, and 75 points for protein names including molecules, families and domains.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tagging gene and protein names in biomedical text

MOTIVATION The MEDLINE database of biomedical abstracts contains scientific knowledge about thousands of interacting genes and proteins. Automated text processing can aid in the comprehension and synthesis of this valuable information. The fundamental task of identifying gene and protein names is a necessary first step towards making full use of the information encoded in biomedical text. This ...

متن کامل

Real-time tagging of biomedical entities

Automatic annotation of text is an important complement to manual annotation, because the latter is highly labor intensive. We have developed a fast dictionary-based named entity recognition system, which is used for both real-time and bulk processing of text in a variety of biomedical web resources. We propose to adapt the system to make it interoperable with the PubAnnotation and Open Annotat...

متن کامل

The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text

In this paper we discuss five different corpora annotated for protein names. We present several withinand cross-dataset protein tagging experiments showing that different annotation schemes severely affect the portability of statistical protein taggers. By means of a detailed error analysis we identify crucial annotation issues that future annotation projects should take into careful considerat...

متن کامل

NCBO Annotator: Semantic Annotation of Biomedical Data

The National Center for Biomedical Ontology Annotator is an ontology-based web service for annotation of textual biomedical data with biomedical ontology concepts. The biomedical community can use the Annotator service to tag datasets automatically with concepts from more than 200 ontologies coming from the two most important set of biomedical ontology & terminology repositories: the UMLS Metat...

متن کامل

Tagging gene and protein names in full text articles

Current information extraction efforts in the biomedical domain tend to focus on finding entities and facts in structured databases or MEDLINE abstracts. We apply a gene and protein name tagger trained on Medline abstracts (ABGene) to a randomly selected set of full text journal articles in the biomedical domain. We show the effect of adaptations made in response to the greater heterogeneity o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003